Search CORE

9 research outputs found

CORE: Automatic Molecule Optimization Using Copy & Refine Strategy

Author: Fu Tianfan
Sun Jimeng
Xiao Cao
Publication venue
Publication date: 23/11/2019
Field of study

Molecule optimization is about generating molecule

Y

with more desirable properties based on an input molecule

X

. The state-of-the-art approaches partition the molecules into a large set of substructures

S

and grow the new molecule structure by iteratively predicting which substructure from

S

to add. However, since the set of available substructures

S

is large, such an iterative prediction task is often inaccurate especially for substructures that are infrequent in the training data. To address this challenge, we propose a new generating strategy called "Copy & Refine" (CORE), where at each step the generator first decides whether to copy an existing substructure from input

X

or to generate a new substructure, then the most promising substructure will be added to the new molecule. Combining together with scaffolding tree generation and adversarial training, CORE can significantly improve several latest molecule optimization methods in various measures including drug likeness (QED), dopamine receptor (DRD2) and penalized LogP. We tested CORE and baselines using the ZINC database and CORE obtained up to 11% and 21% relatively improvement over the baselines on success rate on the complete test set and the subset with infrequent substructures, respectively.Comment: Accepted by AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Sample Efficiency Matters: A Benchmark for Practical Molecular Optimization

Author: Coley Connor W.
Fu Tianfan
Gao Wenhao
Sun Jimeng
Publication venue
Publication date: 09/10/2022
Field of study

Molecular optimization is a fundamental goal in the chemical sciences and is of central interest to drug and material design. In recent years, significant progress has been made in solving challenging problems across various aspects of computational molecular optimizations, emphasizing high validity, diversity, and, most recently, synthesizability. Despite this progress, many papers report results on trivial or self-designed tasks, bringing additional challenges to directly assessing the performance of new methods. Moreover, the sample efficiency of the optimization--the number of molecules evaluated by the oracle--is rarely discussed, despite being an essential consideration for realistic discovery applications. To fill this gap, we have created an open-source benchmark for practical molecular optimization, PMO, to facilitate the transparent and reproducible evaluation of algorithmic advances in molecular optimization. This paper thoroughly investigates the performance of 25 molecular design algorithms on 23 tasks with a particular focus on sample efficiency. Our results show that most "state-of-the-art" methods fail to outperform their predecessors under a limited oracle budget allowing 10K queries and that no existing algorithm can efficiently solve certain molecular optimization problems in this setting. We analyze the influence of the optimization algorithm choices, molecular assembly strategies, and oracle landscapes on the optimization performance to inform future algorithm development and benchmarking. PMO provides a standardized experimental setup to comprehensively evaluate and compare new molecule optimization methods with existing ones. All code can be found at https://github.com/wenhao-gao/mol_opt

arXiv.org e-Print Archive

Quasi-Newton Hamiltonian Monte Carlo

Author: Tianfan Fu
Zhihua Zhang
Publication venue
Publication date: 11/04/2020
Field of study

Abstract The Hamiltonian Monte Carlo (HMC) method has become significantly popular in recent years. It is the state-of-the-art MCMC sampler due to its more efficient exploration to the parameter space than the standard random-walk based proposal. The key idea behind HMC is that it makes use of first-order gradient information about the target distribution. In this paper, we propose a novel dynamics using second-order geometric information about the desired distribution. The second-order information is estimated by using a quasi-Newton method (say, the BFGS method), so it does not bring heavy computational burden. Moreover, our theoretical analysis guarantees that this dynamics remains the target distribution invariant. As a result, the proposed quasiNewton Hamiltonian Monte Carlo (QNHMC) algorithm traverses the parameter space more efficiently than the standard HMC and produces a less correlated series of samples. Finally, empirical evaluation on simulated data verifies the effectiveness and efficiency of our approach. We also conduct applications of QNHMC in Bayesian logistic regression and online Bayesian matrix factorization problems

CiteSeerX

MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization

Author: Fu Tianfan
Glass Lucas M.
Li Xinhao
Sun Jimeng
Xiao Cao
Publication venue
Publication date: 11/12/2020
Field of study

Molecule optimization is a fundamental task for accelerating drug discovery, with the goal of generating new valid molecules that maximize multiple drug properties while maintaining similarity to the input molecule. Existing generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties. To address such challenges, we propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution. MIMOSA first pretrains two property agnostic graph neural networks (GNNs) for molecule topology and substructure-type prediction, where a substructure can be either atom or single ring. For each iteration, MIMOSA uses the GNNs' prediction and employs three basic substructure operations (add, replace, delete) to generate new molecules and associated weights. The weights can encode multiple constraints including similarity and drug property constraints, upon which we select promising molecules for next iteration. MIMOSA enables flexible encoding of multiple property- and similarity-constraints and can efficiently generate new molecules that satisfy various property constraints and achieved up to 49.6% relative improvement over the best baseline in terms of success rate.Comment: Accepted by AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems

Author: Adams Keir
Anandkumar Anima
Aspuru-Guzik Alán
Azizzadenesheli Kamyar
Barzilay Regina
Bekkers Erik
Bohde Montgomery
Bronstein Michael
Coley Connor W.
Daigavane Ameya
Du Yuanqi
Edwards Carl
Ermon Stefano
Fang Ada
Fu Cong
Fu Tianfan
Fu Xiang
Gao Nicholas
Gui Shurui
Günnemann Stephan
Helwig Jacob
Hofgard Elyssa F.
Huang Qian
Jaakkola Tommi
Ji Heng
Ji Shuiwang
Joshi Chaitanya K.
Kurtin Jerry
Ladera Adriana
Lawrence Hannah
Leskovec Jure
Li Xiner
Lin Yuchao
Ling Hongyi
Liu Meng
Liu Yi
Liò Pietro
Luo Youzhi
Mathis Simon V.
Phung Tuong
Qian Xiaofeng
Qian Xiaoning
Saxton Alexandra
Smidt Tess
Strasser Alex
Stärk Hannes
Sun Jimeng
Tehrani Aria Mansouri
Wang Limei
Wang Rui
Wang Yucheng
Weiler Maurice
Wu Tailin
Xie Yaochen
Xie YuQing
Xu Minkai
Xu Shenglong
Xu Zhao
Yan Keqiang
Yu Haiyang
Yu Rose
Zhang Xuan
Zitnik Marinka
Publication venue
Publication date: 15/11/2023
Field of study

Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Science is unique in that it is an enormous and highly interdisciplinary area. Thus, a unified and technical treatment of this field is needed yet challenging. This work aims to provide a technically thorough account of a subarea of AI4Science; namely, AI for quantum, atomistic, and continuum systems. These areas aim at understanding the physical world from the subatomic (wavefunctions and electron density), atomic (molecules, proteins, materials, and interactions), to macro (fluids, climate, and subsurface) scales and form an important subarea of AI4Science. A unique advantage of focusing on these areas is that they largely share a common set of challenges, thereby allowing a unified and foundational treatment. A key common challenge is how to capture physics first principles, especially symmetries, in natural systems by deep learning methods. We provide an in-depth yet intuitive account of techniques to achieve equivariance to symmetry transformations. We also discuss other common technical challenges, including explainability, out-of-distribution generalization, knowledge transfer with foundation and large language models, and uncertainty quantification. To facilitate learning and education, we provide categorized lists of resources that we found to be useful. We strive to be thorough and unified and hope this initial effort may trigger more community interests and efforts to further advance AI4Science

arXiv.org e-Print Archive

An improved i-vector extraction algorithm for speaker verification

Author: DA Reynolds
Jie Zhu
N Dehak
P Kenny
P Kenny
P Kenny
T Stafylakis
Tianfan Fu
Wei Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref